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Abstract — In this paper, we investigate the problem of net- 
work backbone discovery. In complex systems, a "backbone" 
takes a central role in carrying out the system functionality and 
carries the bulk of system traffic. It also both simplifies and 
highlight underlying networking structure. Here, we propose an 
itegrated graph theoretical and information theoretical network 
backbone model. We develop an efficient mining algorithm 
based on Kullback-Leibler divergence optimization procedure 
and maximal weight connected subgraph discovery procedure. 
A detailed experimental evaluation demonstrates both the 
effectiveness and efficiency of our approach. The case studies 
in the real world domain further illustrates the usefulness of 
the discovered network backbones. 

I. Introduction 

In many man-made complex networking systems, a "back- 
bone" takes a central role in carrying out the system function- 
ality. The clearest examples are the highway framework in 
the transportation system and the backbone of the Internet. 
When studying these systems, the backbone offers both a 
concise and highlighted view. Furthermore, it also provides 
key insight on understanding how the entire system organizes 
and works. Does "backbone" exist in natural or social 
network? What should it look like? How we can discover 
them efficiently? Interestingly, the backbone phenomena has 
been recently observed in several natural and social systems, 
including metabolic networks [2|, social networks ifTTl . and 
food webs fl9l . Unfortunately, there is no formal definition 
of the network backbone and no goodness function defined 
in all existing work. 

In the paper, we propose the first theoretical network 
backbone model under an integrated graph theoretical and 
information theoretical framework. Intuitively, network back- 
bone is a connected subgraph which carries major network 
"traffic". It can simplify and highlight underlying structures 
and traffic flows in complex systems. A complex system's 
behavior relies on proper communication through the un- 
derlying network substrate, which invokes a sequence of 
local interactions between adjacent pair of vertices. These 
interactions thus form system-wide network traffic IfTTl . It is 
essential for a complex system to deliver such traffic in an en- 
ergy efficient way |3|. Here, at least two energy costs should 
be considered: 1) communication cost and 2) organization 
cost in defining/recognizing communication path. For the 



first cost, energy-efficient way for one vertex communicating 
with another vertex naturally points to shortest path, i.e., 
the information flow over a network primarily follows the 
shortest ones (7). The second cost can be described as a path- 
recognition complexity. Based on information theory, we 
could depict it as the shortest coding length of a given path. 
In general, the optimal code length for an edge can be 
bounded by log ^J^-jy, where p(i — > j) is the probability of 
a shortest path taking edge (i, j) when it goes through vertex 
i 0. 

Minimizing these two costs gives rise to the network 
backbone structure: the shortest paths form a traffic flow 
which must efficiently travel from source to destination using 
the backbone. Especially, if only the first cost is considered, 
only the edges with high betweenness J7) (roughly speaking, 
edge-betweenness defines the likelihood of any shortest path 
going through an edge) will be selected; however, those edges 
are not necessarily connected Q, how they should work 
together in delivering the system-wide traffic is unknown. In 
this aspect, the second cost enables us to further constrain the 
backbone structure using the path-recognition complexity. A 
backbone that is too simple or has wrong topology is not an 
efficient route for vertex-vertex paths (first cost). A backbone 
that is too complex is expensive to describe shortest paths 
(second cost). 

Figure Q] illustrates a backbone and its usage in reducing 
the description length of a shortest path. Since our com- 
putational framework implicitly evaluates path's information 
complexity by statistical likelihood, it does not rely on any 
particular coding scheme (i.e., Huffman code is used here 
only for illustrative purpose). Figure |l(a) shows a network 
with its highlighted backbone. Figure 1(b) | focuses on a 
subgraph, showing Huffman code for each edge, where the 
upper part is for the edge code without utilizing backbone 
and the lower part is for using backbone. For instance, edge 
(9, 2) with direction 9 — > 2 is assigned with code 1, and 
with direction 2 — ^ 9 is assigned code 000. In Figure |l(c)| 
we show path codings using and without using the backbone. 
We basically list each edge code in the path consecutively. 
Specifically, the subpath (2,3,4,5) is in the backbone, and 
we use "[" and "]" to denote the entering and exiting of the 
backbone, respectively. It can also be observed that even with 




(a) Network and its backbone (b) Coding of subgraph (c) Path coding example 



Fig. 1 : Path encoding with and without backbone 



an extra coding cost for entering and exiting backbone, the 
lower coding cost for paths inside the backbone can still be 
beneficial. 

We note that backbone discovery problem resonates with 
the recent efforts in the graph mining community on graph 
simplification. To deal with the scale and complexity of graph 
data, reducing graph complexity or graph simplification is 
becoming an increasingly important research topic ll25l . 
|23l , IfTSl , H]. Generally, graph simplification focuses on 
sparsifying graphs by reducing non-essential edges (25), 
|23l , ifTHl , extracting key vertices ll22ll . (9), or collapsing 
substructures into supernodes 03), 0"). These simplified 
structures are able to facilitate many real-world applications, 
such as topology visualization lfl6l . and computational 
speedup on various graph-centered tasks ifTol . l22l . lfl8l . 
From this perspective, the backbone structure can potentially 
serve as a simplification approach, which highlights a core 
set of vertices and edges in the original network. 

II. Statistical Learning Framework for 
Backbones 

In this section, we introduce and refine the backbone 
description under a statistical learning framework. Based on 
the well-known relationship between information complexity 
and statistical likelihood, the information complexity of each 
path (or a set of paths) can be equivalently represented as 
a likelihood function. Further, this reformulation also gen- 
eralizes the notion of backbone to optimize the information 
complexity for any given set of paths (information pathway) 
beyond the shortest paths. 

A. Notations 

To facilitate our discussion, we introduce the following 
notations. Let G = (V, E) be a undirected graph with 
edges E C V x V . Since each edge (u, v) G E in the 
graph may be assigned two codes, one for (u — > v) and 
another for (v — > u) (See Figure [TJ, it is more convenient to 
consider each undirected edge (u, v) as two directed edges 
(u — > v) and (v — > u). Therefore, we represent undirected 
graph G as a bidirected graph, i.e., G = (V, E) where 
£ = U( U! „) S £;{(m — > v) 7 (v — > u)}. Note that when we say 
an edge (u, v) is a backbone edge in a undirected graph, it 
suggests that both directed edges (u — > v) and (v — > u) 



from £ are backbone edges in the bidirected graph. The 
same holds for non-backbone edges. This constraint does 
not hold for directed graphs, where each directed edge is 
evaluated independently. Though this paper mainly focuses 
on undirected graphs, this backbone discovery framework 
can be easily generalized to directed graphs. 

For a vertex u G V, let Af(u) be the immediate neighbors 
of u, i.e., Af(u) = {v\(u v) G £}. Let X{u) be all the 
incoming edges of vertex u, i.e., X(u) = {(w — > u)\w G 
Af(u)} and let 0{u) be all the outgoing edges of vertex 
u, i.e., 0(u) = {(u -s> v)\v G Af(u)}. Note that, in 
bidirected graphs, I(u) = 0(u). A path P from vertex 
u to vertex v can be defined as a vertex-edge sequence, 
i.e., (u , ei, iti, e 2 , • • • e fc ,M fe ), where u = u , v = 

Uk, and (ui^i — > u») = e^. When no confusion arises, 
we use only the edge sequence to describe the path as 
(ei, e2, • • • , efc). In this paper, we only consider simple path 
such that no vertex appears more than once in a path. The 
path length is defined as the number of edges in the path. 
The shortest path from vertex u to v is the one with minimal 
path length which is denoted as P uv . 

Let V be a collection of paths in graph G characterizing 
system-wide information flow. Without prior knowledge, 
we consider V to contain shortest paths for each pair of 
vertices in graph G, i.e, V = {P U v}- In case there is more 
than one shortest path between a pair of vertices, only one 
shortest path is added to V . This is consistent with the 
earlier assumption that any two pairs of vertices have equal 
communication frequency. Alternatively, we may assign each 
path with an equal weight such that total weight of all 
shortest paths is one. 

B. Two Simple Models 

In this subsection, we consider two simple statistical mod- 
els for generating a collection of paths in V. At a high level, 
both generative models try to assign each edge a probability 
(or multiple probabilities) such that the probability of each 
path can be derived by augmenting its edges' probabilities. 
Edge Independent Model: In the first model, referred to as 
edge independent model, each outgoing edge (u — > v) of a 
vertex u in the graph is assigned a probability p(u — > v), 
and the sum of the probabilities of all outgoing edges 
should be equal to one, i.e., Z)(«_>«)eo(«)P( u ->■") = 
1. Furthermore, we assume any two edges in a path are 



independent. Given this, the probability of a path P = 

(v ,e 1 ,v 1 ,e2,V2, ■ ■ ■ ,v k -i,e k ,v k ) is p(P) = 

p(ei)p(e2) • ■ -p(ek) = p(vo— ¥Vi)p(vi — ¥Vv) ■ ■ -p(i>fc-i -^v k ) 

Given the collection of paths V, the overall likelihood L](V) 
of these paths being generated from this model is 



(1) 



where N e is the number of paths in V passing through edge 
e. Note that if V includes all shortest paths in G, then N e is 
often referred to as the edge betweenness Q. To maximize 
the overall likelihood Lj(V), using the Lagrange multiplier 
method, it is easy to derive that the optimal parameters are 



p(e) = p(u — > v) 



where M u is the number of paths going through u, i.e., 
reaching vertex u and then continue to one of its neighbors. 

Clearly, there are a total of 2\E\ = \£\ parameters in 
the edge independence model. Note that this model directly 
corresponds to the coding scheme where each edge (u — > v) 
is assigned a unique code with length , . Thus, under 
this scheme, the overall minimal coding length for all paths 
in V is simply the negative log-likelihood, — log Lj(V). 
Edge Markovian Model: The edge independent model is 
one of the simplest models for describing the path probabil- 
ity. However, it is also rather unrealistic and results in poor 
model performance. A path itself is a correlation between 
edges, so the edge probability model should consider this. 
In the second model, we replace the independent assumption 
with the Markovian property, i.e., the probability of an edge 
is determined by its immediately preceding edge in a path. 
Given this, the probability of a path P = (ex, &2, • • ■ , efe) is 
rewritten as 

p{P) = p(ei)p(e 2 \ex) ■ ■ ■p{e k \e k - 1 ) 

where p{ej\ei) is the conditional probability of edge ej 
appearing after edge ej in the path. 

Given the path collection V, the likelihood function for 
generating all these paths can be written as 

Lm(V)= Hp(P) = Up( e ) N ' e II P(e'\e) N °°' 

P£V ee£ e,e'e£ 

where N' e is the number of paths in V starting with 
edge e, and N ee i is the number of paths with consecutive 
edges (e, e'). We note that this model directly corresponds 
to a Markov chain where each edge represents a state and 
the conditional probability p(e'\e) represents a transition 
probability from state (edge) e to state (edge) e'. However, 
though this model is more accurate at capturing the paths 
in graph G, it is also much more expensive in terms of 
number of parameters. It requires J2 v ev \-^( v )\ x 1^(^)1 = 
Y.vev W(v)\ 2 parameters. 

C. Bimodal Markovian Model 

We propose a new model with reduced number of parame- 
ters to improve model performance. This model is motivated 
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(a) Backbone vertex u (b) Probability table 

Fig. 2: Example of backbone vertex u 



by the observation on the usage of highway in the transporta- 
tion system: compared to complex local traffics, less infor- 
mation is needed to describe highway traffic. Mapping back 
to modeling a path using backbone structure, this suggests 
that its subpaths only consisting of edges from backbone 
can be described in a relatively coarse manner comparing to 
subpaths containing non-backbone edges. Complying with 
this intuition's guidance, we introduce Bimodal Markovian 
Model utilizing backbone structure to reduce the number of 
parameters in edge markovian model while minimizing the 
loss of its modeling accuracy. It is termed bimodal because 
all vertices and edges would be categorized into two groups 
in the model. 

Figure |2] illustrates the basic idea of this model. In figure 
|2(a)| for each vertex u, all its incoming edges are divided into 
two categories, the backbone edges and the non-backbone 
edges. For each outgoing edge e = (u — > v) from backbone 
vertex u, it only has two probabilities, conditional on the 
categories of incoming edges, denoted as p(e\BnX(u)) and 
p(e\BDl(u)), where B is the set of all backbone edges and 
B is the set of all non-backbone edges (B U B = £). In 
other words, multiple probabilities of edge e conditional on 
different incoming edges are reduced to only two probabil- 
ities. From clustering viewpoint, this can be considered as 
performing biclustering on each vertex's incoming edges. 

It is worthwhile to note that only the conditional prob- 
ability of directed edge starting from backbone vertex in 
edge markovian model would be affected by this model. 
In other words, we still keep the conditional probability 
expressed in edge markovian model for edges starting with 
non-backbone vertex. For instance, consider the sh ortest 
path P = (14,1,6,5,17,18) highlighted in Figure [1(a) 
The probability of this path can be expressed by bimoda 
markovian model as p(P) = 

p(14 -> l)p(l -)• 6|F)p(6 -)• 5\B)p(5 -> 17|S)p(17 18|5 17) 

Note the conditioning on the first edge entering the backbone 
and the first edge after leaving the backbone: even though 
(1 — > 6) is a backbone edge, the probability we must use 
is p(l — > 6\B) and though (5 — > 17) is a non-backbone 
edge, its probability is p(5 — > 17|£>). For consecutive edges 
(5 — >• 17 — > 18) without involving backbone vertices, the 
conditional probabilities of (17 — > 18) follows what we used 
in edge markovian model. 

Given this, the overall likelihood for a given collection 
of paths V with respect to the backbone subgraph B = 



(V B ,E B ) is described as L B {V) = Up e vP( p ) 
= Y[p( e ) Ns II p(e'\e) N - 

e€£ ut£V B Ae'eO(u) 

n p(e\B) N "° p{e\Bfz 



(2) 



u€V B AeGO(u) 



ugl/ B AeGO(u) 



where is the number of paths in V starting from 
edge e, N' ee , is the number of paths with consecutive edges 
(e, e') while intermediate vertex connecting two edges are 
not backbone vertex, Ng e and N-g e denote the number of 
paths passing through e when its starting vertex is backbone 
vertex or non-backbone vertex, respectively. In connection 
with edge markovian model, we observe that for backbone 
vertex u and its outgoing edge e = (u — > v), N^e = 
Ee'eBni(u) N e'e and % e = Ee>eBnx(u) N e'e, where N e , e 
is the number of paths in V containing the consecutive edges 
(e'e). 

In our framework, the overall number of parameters in 
bimodal markovian model is J2 v <£v \-E( v )\ x |C( V )I + 

Zvzvs 2|0(«)l = Y, vi v B W{v)\ 2 + t^v B IPW. Com- 
pared to edge markovian model, the saving regarding the 
number of parameters is 
Zvev B (\l(v)\-2) x \0(v)\. 

Given this, we formally define optimal backbone discovery 
problem based on bimodal markovian model: 

Definition 1: (Optimal A"-Backbone Discovery Prob- 
lem) Given a complex network G = (V, E), the targeted 
path set V and the number of backbone vertices K, the 
network backbone B = (Vb,Eb) is a connected subgraph 
with \Vb\ = K such that L B (V) in Formula|2]is maximized. 

Note that in this definition, we allow the user to define the 
number of vertices in the backbone structure. Alternatively, 
as a model selection problem, we may use a parameter 
penalty to help determine the optimal backbone size. For 
instance, if we use the Akaike information criterion (AIC), 
then, we simply want to optimize — log L(V\B) + (\B\ + 
SueVfl I^MD- ^ we use the Bayesian information criterion 
(BIC), then, our goal is to optimize -2 log L{V\B) + (\B\ + 
S-ugVs l-^( w )D 1°S 1^1' where \P\ is considered to be the 
sample size. In this paper, our decision to study the optimal 
backbone model problem for any given number of vertices 
is based on several considerations. First, though this model 
treats backbone discovery as a model selection problem, 
in many applications, the construction of backbones might 
involve other costs. Thus, it is more convenient and flexible 
to set up an adjustable number of vertices. Second, the 
solution of this problem forms the basis for solving the 
AIC or BIC criterion as we can utilize the solution with 
respect to different K and then choose the overall optimal 
backbones. Finally, it is important and useful to observe 
how the backbone grows (shrinks) when its size increases 
(decreases). Therefore, in the reminder of the paper, we focus 
on studying the Optimal K-Backbone Discovery Problem. 
We will empirically investigate the relationship between 



backbone size and the likelihood of bimodal markovian 
model in subsection I VI-BI 

Relationship to Path Coding Length: Before we work 
towards a solution for this problem, let us first confirm its 
relationship to the problem of optimizing path-recognition 
complexity (section J}. Given a backbone structure Gb, it 
is not hard to see that — log Lb (V) serves as the corre- 
sponding coding length of V in network G. When Lb{V) 
is maximized, the optimal coding can be derived to describe 
network G. 

In addition, recall that in the coding scheme, we have two 
special codes, representing entering the backbone ("[") and 
exiting the backbone ("]")■ To show how their code length 
is represented in the probabilistic model, let us take a look 
at the transition probability _from a non-backbone edge to 
a backbone edge p(u — > v\B D I(u)), (u — > v) G B, and 
the transition probability from a backbone edge_to a non- 
backbone edge p(w — > z\B (11(h)), (w — > z) G B: 



p(u — > v\B) — p'(u — > v\B) x p([\Bu), where p'(u v\B) 

P( u -> v'\B) 

p(w — > z\B) — p'(w — > z\B) x p(]\B w ), where p'(w — ► z\B) 
_ p(w — V z\B) 



p(]\Bu 



,P(\\B W )= Yl p{w^z'\B) 



Here B u = B (~l T{u) and B w = B (1 I{w) coiTe- 
spond to the set of incoming non-backbone edges to u 
and incoming backbone edges to w. Table |2jb) illustrates 
p([|B„) and p(]\B u ). In other words, the code length "\e" 
{e.i = (u —> v) G B) corresponds to the transition 
probability from a non-backbone edge to a backbone edge: 
p([\~B u )p'{ei\B) = p{ei\B). Similarly, the code length of 
"ej]" (ej = (w — > z) G B) corresponds to the transition 
probability from a backbone edge to a non-backbone edge: 
p'(e j \B)xp(]\B w )=p(e j \B). 

III. Backbone Discovery based on Vertex 
Betweenness 

In this section, we introduce a straightforward approach 
based on vertex betweenness and minimal steiner tree to 
discover backbone. This approach also serves as the basic 
benchmark for backbone discovery. Recall that, network 
backbone is a connected subgraph carrying the major traffic 
formed by shortest paths. In the meanwhile, vertex between- 
ness has been widely used to evaluate the importance of 
a vertex by the number of shortest paths passing through 
it. Given this, the straightforward solution for optimal K- 
backbone discovery problem is to consider K vertices with 
highest betweenness as backbone vertices, since they tend 
to captures more information flow following shortest paths. 
Ideally, if these vertices are connected in the graph, its 
corresponding induced subgraph naturally forms the back- 
bone, where the edges included in the induced subgraph are 
considered as backbone edges. 



However, these K vertices are not necessarily connected 
to each other. To obtain backbone structure, we want to 
build the connections among them while introducing minimal 
number of extra vertices. Since we focus on unweighted 
graph in this paper, this problem is essentially an instance 
of minimal steiner tree problem. The minimal steiner tree 
problem has been proved to be NP-hard, but an approxima- 
tion algorithm has been introduced in l24l . Applying this 
method, we are able to gain a set of connected vertices 
as the superset of backbone vertices. The corresponding 
induced graph is treated as candidate backbone structure. To 
discover backbone with exactly K vertices, we can utilize 
a refinement strategy to remove extra backbone vertices 
iteratively. Basically, in each iteration, we remove the vertex 
with smallest vertex betweenness from current graph. If 
remaining graph is not connected, we consider the removal 
of vertex with second smallest betweenness. 

Algorithm 1 BackboneDiscovery_VB(G = (V, £),K) 

Parameter: G is input network, K is the backbone size 
1: Compute vertex betweenness for each vertex using method 
in 0; 

2: Select vertex set V s including K vertices with largest vertex 
betweenness; 

3: Construct minimal steiner tree T = (Vt,Et) on vertex set V s 
(i.e., V s C Vt) by approximation algorithm 1241 : 

4: Gb is induced subgraph of G on vertex set Vt; 

5: Q <— Vt; {Q is a queue which stores vertices in ascending 
order of their corresponding vertex betweenness} 

6: while \Vt\ > K do 

7: u 4— Q .pop_front(); 

8: G' B is the induced subgraph of Gb on vertex set Vt \ {u}; 

9: if G' B is connected graph then 
10: Vt <-Vt\ {it}; 

11: Gb^G'b; 
12: else 

13: Q .push_back(u); 

14: end if 
15: end while 
16: return Gb; 



We sketch this approach in Algorithm Q] The algorithm 
mainly consists of two steps: candidate backbone discovery 
step (Line 1 to Line 4) and refinement step (Line 5 to 
Line 15). To begin with, the betweenness of each vertex 
is computed by fast approach in [0 (Line 1). We then 
select top K vertices with largest betweenness and construct 
the minimal steiner tree T — (Vt,Et) over this vertex 
set (Line 2 to Line 3). Now, we have a set of connected 
vertices Vt and its corresponding induced subgraph Gb 
serves as the candidate backbone (Line 4 to Line 5). In the 
refinement step, we firstly store Vt into a queue according 
to their corresponding betweenness in ascending order (Line 
5). During main loop, in each iteration, we consider the first 
vertex u in the queue (Line 7). If the removal of vertex 
u and its incident edges will disconnect current graph, we 
push u back into queue for further consideration (Line 13). 
Otherwise, the remaining graph G' B is used as current graph 



Gb in next iteration. The refinement procedure proceeds 
until only K vertices remains in the graph. This resulting 
graph is connected and returned as backbone. 
Computational Complexity: In the candidate backbone 
discovery step, we take 0(|V||i?|) time to compute vertex 
betweenness and 0(K 2 \V\) time to build steiner tree. For 
refinement step, since we can remove at most one vertex from 
Vt and connectivity checking on Gb takes at most 0(|.Et|) 
in each iteration, total running time is 0((|Vr| — -^Ol-^rl) 
in the worst case. Putting both together, the overall compu- 
tational complexity is C>(| V| |-E7| + K 2 \V\). 

Instead of optimizing likelihood function, this straight- 
forward approach only employs vertex betweenness to ap- 
proximate the contribution made by each vertex to Lb{V). 
In addition, this approach neglects the effect of edges on 
likelihood function by directly using the edge set of in- 
duced graph as backbone edges. Therefore, the discovered 
backbone may not necessarily maximize likelihood function 
Lb(V)- In the next section, we propose novel approaches to 
discover backbone by directly considering the optimality of 
likelihood function. 

IV. Optimizing Bimodal Markovian Model 

In order to solve the optimal If -backbone discovery prob- 
lem, two key questions should be considered: 1) Can we 
identify a set of K connected vertices as candidate backbone 
vertices based on their contribution to likelihood of bimodal 
markovian model? 2) For a given set of K vertices, how to 
extract an optimal backbone such that Lb (V) is maximized? 
In other words, the edges in the induced subgraph of these 
K vertices need to be classified into either backbone or non- 
backbone edges. It turns out the second question is more 
fundamental as it can directly contribute to the solution of 
the first one. Simply speaking, the solution of the second 
question offers an effective way to estimate individual ver- 
tex's contribution to the objective function Lb{V) and thus 
is very helpful in selecting the backbone vertices. 
Backbone Edge Selection of K Vertex Set: In the fol- 
lowing, we introduce a log-likelihood representation of the 
objective function Lb(V) to simplify our problem in dis- 
covering the optimal backbone given a set of K vertices. 
Formally, let Vb be a subset of connected vertices in G and 
G[Vb] is corresponding induced graph of G. The backbone 
graphs Gb = (Vb,B) C G[Vb], i.e., the edges in the 
backbone can only come from the edges in the induced 
subgraph by vertex set V s . Given path set V, we want 
to extract edge set Eg from G[Vb] achieving maximum 
Lb{V). This problem turns out to be rather challenging on 
undirected graph due to large search space and the edge 
consistent constraint that the categorization of both directed 
edges (u — > v) and (v — > u) of an undirected edge (u, v) 
should be the same. In fact, it is even non-trivial to categorize 
individual backbone vertex's incoming edges into backbone 
and non-backbone without consistent constraint. 



To facilitate maximizing Lb{V) (Formula[2]i, we compare 
it with the likelihood of edge markovian model LmO^) 
(Formula [2}. First of all, we rewrite likelihood of edge 
markovian model as Lm(V) = 



n*w* n 

e££ u£V B /\e'£0(u) 



p(e'\e) N - 
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uev B Ae'eo(u) 



p(e'\e) N "' 



Now, we introduce the likelihood ratio (first two terms are 

L B (V) _ 
Lm(V) 



canceled out for simplification) LR(Vb) ~ 
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Given graph G and path set V, L^j(V) is a constant, 
assuming each of its parameters p(e\e') is optimized for the 
maximal likelihood. Therefore, maximizing the likelihood 
ratio Lb(V)/ Lm(V) is equivalent to maximize Lb(V). The 
following definition formalizes our problem: 

Definition 2: (Optimizing Vertex Set Likelihood Ratio 
Problem) Given graph G = (V,E) and path set V, for 
any connected vertex set Vb, we would like to construct 
backbone subgraph Gb = (Vb,B), where B C Vb x Vb C\E 
and LR(Vb) (Formula [3} is maximized. 

A. Clustering Incoming Edges of Individual Vertex 

To approach the problem(Definition O, we start with 
relaxing the consistent constraint. In other words, for directed 
edges (u — > v) and (v — > u) with opposite direction, we 
assume that each of them can be determined independently 
to be a backbone or non-backbone edge. Then, the following 
rule is applied to enforce consistent constraint: we say 
(u, v) € E is backbone edge iff both (u — > v) and (v — > u) 
are backbone edges. 

First of all, we rewrite LR(Vb) = 2~2uev B LR(u) where 



LR(u) = 



n, 



60(i 



,p(e|B)"-n 



eEO(u) 



P (e\B) 



n, 



e'el(u),e£ 



0(u)P( e \ e> ) 



Based on aforementioned relaxation, we can see that 
LR(u) is independent of LR(v) if u ^ v, i.e., the optimality 
of LR(u) will not affect the optimality of LR(v). Therefore, 
optimizing each LR(u) corresponding to vertex u G Vb 
individually is able to result in global maximization of 
LR(Vb)- Given this, our problem is converted to catego- 
rizing each vertex u's incoming edges as backbone edges or 
non-backbone edges in order to optimize LR(u). From the 
viewpoint of clustering, this essentially group each vertex's 
incoming edges into only two clusters (backbone or non- 
backbone). 

Before proceeding to our solution, we first study how to 
compute optimal L b (u) assuming backbone edges are given. 
Essentially, we want to figure out the optimal p{e\B) and 



p(e\B) leading to maximal Lb{u). It is not hard to derive 
following result: 

Lemma 1: For a backbone vertex u, assuming each in- 
coming edge has been categorized as backbone or non- 
backbone, i.e., B and B, then the minimum of — log LR{u) 
is achieved when 



p(e\B) = 



— and p(e\B) = <^-j- 



(4) 



Proof Sketch: The minimum of — log LR(u) is essentially 
finding the maximum value of likelihood function subject to 
certain probabilistic constraints. Using Lagrange multiplier 
method with two probabilistic constraints YleeOCu) P( e \B) = 
1 and 

2~2eeo(u)P( e \B) = 1» we are ar, le to obtain the optimal 
values of p(e\B) and p(e\B). □ 

Algorithm for Edge Clustering: We propose an iterative 
refinement algorithm to resolve this edge clustering problem 
on each vertex u. Initially, each incoming edge e is randomly 
assigned to be backbone edge or non-backbone edge. Given 
such assignment, optimal value of LR(u) can be computed 
based on corresponding optimal p(e\B) and p(e\B) (For- 
mula 3J. In the subsequent iterations, we iteratively refine 
each edge's cluster membership in order to achieve a better 
value of LR(u). The iterations terminate until no further 
improvement can be obtained. Interestingly, we will show 
that this method is essentially a K-Means under Kullback- 
Leibler divergence measure (K=2). 

To further explain the algorithm, we express LR(u) in 
negative log-likelihood format as follows (for simplicity, I 
and O are used to replace I(u) and 0{u)): — log LR(u) = 

,p{e\e'). 



eeo e'eeni 



p(e\B) 

&■ fcCM IX 

= E M e ,^>(e|e')log 



p(e\B) 

P(e\e') 
p(e\B) 



+ 



y M e ,y p (e\e')lo g EM^l 

- p{e\B) 

e'esni eeo v 1 ' 



where M e / = Xle60(u) ^e'e is me t° ta l number of paths 
passing through edge e' and then continue to one of its 
neighbors. 

Indeed, 2^2 e eoP( e \ e ') p(e|B) s i m ply corresponds to 
the well-known Kullback-Leibler divergence between two 
distributions 

(p(ei\e'), ■ ■ ■ ,p(e k \e')) and (p(ei|B), • • • ,p(e k \B)), where 
d, •••,ejfe € 0(u) and k = \0(u)\. Here, each incoming 
edge e' G I{u) corresponds to a point with k features 
(p(e|e'),e G 0(u)) to be clustered. In addition, p(e\B) 
and p(e\B) are interpreted as "centers" for the two clusters, 
the backbone clusters B n I{u) and non-backbone clusters 
Bf)I(u). In this sense, the objective function — log LR(u) 
actually serves as the within-cluster distance. Now, we can 
utilize the K-Means type clustering to categorize incoming 
edges into backbone edges or non-backbone edges. In each 
refinement iteration, each incoming edges is assigned to the 
cluster who results in smallest KL-divergence. The procedure 



Algorithm 2 Bi-KL-Partition(Vertex u) 



Algorithm 3 GBi-KL-Partition(Vertex Set V s ) 



randomly partition the incoming edges of u into B and B; 
compute p(e\B) and p(e\B), e G 0(u); 
repeat 

assign each incoming edge e' to the cluster with the closest 
distance: min(J2 ee0(u) p(e\e') log 

E ee0(u )P(e|e')log^^); 
5: calculate the two new centroids: 

^ = and p(e|B) = ^iS^ 

6: until stop criteria is satisfied 





W V 

(a) Undirected graph (b) Bidirected graph 

Fig. 3: Observations related to backbone vertices 



is outlined in Algorithm [2] Clearly, this algorithm will 
converge to a local minimum similar to classical K-Means 
algorithm. 

B. Clustering Undirected Edges 

Utilizing above clustering method, we are able to convert 
the global optimization problem into the local problem of 
optimizing each vertex independently. Although the method 
is rather efficient, for each undirected edge (u,v), both 
(u — > v) and (v —> u) cannot be guaranteed to be in 
the same category. Thus, such clustering only generates an 
upper bound of each individual vertex's benefit brought to 
bimodal markovian model. This upper bound will be utilized 
for selecting candidate backbone vertices in subsection IV-AI 

To address the edge consistent constraint, we further 
explore the property of edges incident to Vb- We observe 
that: 1) any undirected edge e' = ( u, u\) G E with only one 
endpoint u in Vb (see Figure |3(a)| i is not a candidate to be a 
backbone edge; 2) for each undirected edge e' = (u, v) S E 
with both endpoints in Vb, we need (u — »• v) and (v — > u) to 
be both in B or in B (as shown in Figure [3(b)| . To distinguish 
those two types of edges, we decompose the likelihood ratio 
LR(Vb) into three parts: 



log LR{V B ) 



E E N < 

e'e(E\V B XV B ) eGO(u) 



:10g^^ + 



p(e\B) 

F(e'\B)+ F(e'\B) (5) 
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e'ev B xv b de 



where 

eeO(u) ^ V 1 1 egO(i)) ^ 1 ' 

F(e'\B)= T 7V e , e log^)+ y iV e , e l og ^l 

Note that, first term of Formula [5] relates to the probabilities 
of edges with one endpoint in Vb, F{e'\B) and F(e'\B) 



1: Assign each edge with only one end in V s to cluster B; 

randomly partition edges with both ends in V s into B and B) 
2: for each vertex u 6 V 3 , compute optimal p(e\B) and p(e\B), 

e £ 0(u); 
3: repeat 

4: assign each edge e! to the cluster with the closest distance: 

min(F(e'\B),F(e'\B)); 
5: calculate new centroids of two clusters for each vertex u G 

6: until stop criteria is satisfied 



'eo(u) 



consider the cases of edge e' being backbone edge and 
non-backbone edge, respectively. Given this, we derive the 
following result: 

Lemma 2: For a connected vertex set Vb, supporting all 
edges of induced graph G[Vb] have been categorized as 
backbone or non-backbone, the minimum of — log LR(Vb) 
is achieved when 



p{e\B) 



and p(e\B) 



53e'GO(u) N Be i 



We apply the same technique based on Lagrange multi- 
plier for obtaining Lemma |4] to gain the value of p(e\B) 
and p(e\B for optimizing — log LR(Vb)- The probabilistic 
constraint here is the same, i.e., 2~2 e eo(u) Pi e W) = 1 an d 

Eeeo(„)P( e |S) = 1. 

The first term of Formula|5]is relatively stable on different 
backbone edge set as only p(e'\B) is involved. Therefore, 
the minimization of — log LR(Vb) can be approximately 
achieved by minimizing last two terms of Formula[5] serving 
as within-cluster distance. Given this, we describe our gen- 
eralized Bi-KL-Partition algorithm (Algorithm O to solve it, 
which again has an interesting convergence property. The 
basic idea is to update the cluster membership of each 
candidate backbone edge e' = (u, v) e Vb x Vb H E based 
on F(e'\B) and F(e'\B) with the_optimal p(e\B) and p(e\B) 
for the current clusters B and B. In other words, F{e'\B) 
and F(e'\B) describe a generalized "distance" function from 
a point (edge) e' to corresponding centroids. 

Lemma 3: (Convergence Property) As we iteratively up- 
date membership of each edge e' G Vb xVb Hi? in Algorithm 
|5] the function — log LR(Vb) converges to a local optimum 
in finite iterations. 

Proof Sketch: Let us use F/ and Ff to denote the values 
of objective function obtained from step 1 (Line 4) and step 2 
(Line 5) at i-th iteration in Algorithm[3] respectively. Clearly, 
Ff records the value of objective function at the end of 
iteration i. Assuming the algorithm just finishes iteration i, 
we will show that the value of — log LR(Vb) in iteration 
i + 1 is no greater than the value obtained from iteration 
i. Considering the step 1 in iteration i + 1, for each edge 
e' G Vb x Vb HE, its within-cluster distance is reduced (i.e., 
min{F(e'\B),F(e'\B))), i.e., F? > Fixing backbone 

edge assignment, step 2 attempts to minimize the objective 



function by updating two clusters' centroids. In other words, 
we guarantee that -Fi_i — ^i+v Considering both, we have 
F 2 > F} > F 2 +1 . In this sense, the value of - log LR(V B ) 
cannot be increased when the number of iteration increases. 
On the other hand, the number of possible edge assignment 
to be backbone or non-backbone is bounded by 2'- Es where 
E s is the edge set of induced graph based on vertex set V s . 
This implies that the number of iterations in Algorithm [3] is 
at most 2l Bs L Putting both together, the lemma holds. □ 

V. Algorithms for Backbone Discovery 

In this section, we will introduce two algorithms to dis- 
cover the backbone with K vertices for optimizing bimodal 
markovian model. The first algorithm tries to choose a set of 
connected vertices as backbone vertices by certain criteria, 
and then discover backbone edges among them in order to 
maximize Lb(V). Interestingly, the first step can be con- 
verted to an instance of maximum weight connected subgraph 
(MCG) problem |fl2ll . However, the selected vertices in first 
algorithm cannot guarantee to produce "good" backbone. We 
thus further propose second algorithm starting from above 
backbone and iteratively refine it to achieve better value of 
L B (V). 

A. Backbone Discovery based on Maximal Weight Con- 
nected Subgraph 

The optimality of resulting backbone highly depends on 
the firstly selected backbone vertices. How to choose "good" 
backbone vertices is a challenging problem, as it is impos- 
sible to determine "goodness" of a set of backbone vertices 
in terms of Lb{V) without backbone structure. To tackle it, 
we utilize the upper bound of their contribution to Lb(V) in 
order to approximate the true "goodness". More specifically, 
we assign a score to each vertex which corresponds to 
maximal contribution by this vertex to the likelihood. For a 
set of connected vertices, their overall weight (sum of vertex 
weight) serves as an upper bound of their true likelihood. 
Larger upper bound potentially leads to better true value, 
thus we attempt to find a set of connected vertices with 
maximal upper bound to effectively approximate their true 
contribution. Then, GBi-KL-Partition procedure is used to 
discover backbone edges connecting them. 
Upper Bound: Since most vertices will not be backbone 
vertices, we will rewrite our target maximal likelihood as 

log L B (V) = log + log Lj (V) 

where Lj(V) is the likelihood function for the edge in- 
dependence model. Given this, we introduce log-likelihood 
ratio F(u) which represents the benefit for this vertex being 
a backbone vertex: 

F{u) = Eeeo W (N Be log + log ™) (6) 

For simplicity, we omit the benefit for edge e to be the first 
edge in any shortest path (this portion is very small). It is easy 
to see that log xfr^y ~ J2 u <ev F( u )- ^ n m i s case ' we can 



invoke the Bi-KL-Partition procedure for each vertex u and 
find its optimal bimodal markovian model. In the meanwhile, 
the values of p(e\B), p(e\B) for each edge e G 0(u) are 
obtained. Now, we can calculate F(u) for each vertex u and 
assign it as corresponding vertex weight in graph G. 

Given this, the problem of choosing a set of connected 
vertices with maximal sum of vertex weight is converted 
to an instance of maximum weight connected graph (MCG) 
problem lfl2l . 

Definition 3: (Maximum Weight Connected Subgraph 

Problem (MCG)) Given a graph G = (V,E) where each 
vertex has a weight w(v), and a positive integer k, maxi- 
mum weight connected subgraph problem tries to identify 
a connected subgraph G' = (V',E') where \V'\ = k and 
S-uev w ( v ) 1S maximized. 

The MCG problem has been proven to be NP-hard, but an 
efficient heuristic algorithm has been proposed to find a max- 
imal weight connected subgraph ff3l . We apply this heuristic 
method, termed MCG, on our vertex-weighted graph to 
extract a set of connected vertices as backbone vertices. 
Following that, we utilize GBi-KL-Partition procedure to 
discover backbone edges for optimizing Lb{'P). 



Algorithm 4 BackboneDiscovery(G = (V, £),K) 

Parameter: G is input network, K is the backbone size 
1: for each u € V do 
2: invoke Bi-KL-Partition(u); 
3: compute F(u); 
4: end for 

5: Vb <- MCG(V,K); 

6: B <- GBi-KL-Partition(V B ); 

1: G B <- (V B ,B); 

8: return Gb\ 



The procedure to discover backbone based on maximal 
weight connected subgraph is outlined in Algorithm |U To 
begin with, we invoke the Bi-KL-Partition procedure for 
each vertex u to find its optimal bimodal markovian model 
and p(e\B), p(e\B) for each e e 0(u) (Line 2). Then, we 
calculate F(u) for each vertex u as its weight in graph G 
(Line 3). Following that, we use heuristic algorithm of MCG 
on vertex-weighted graph G to identify backbone vertex set 
Vb (Line 5). Finally, GBi-KL-Partition procedure is applied 
to extract backbone edges B among Vb (Line 6). 
Computational Complexity: In the main loop, procedure 
Bi-KL-Partition dominates computational cost. It takes at 
most 

EuevO(c\l(u)\ x |0(tt)|) = 0(J2 ueV c|AA( u )| 2 ) = 
0(c| V \d 2 ) time, where c is the number of iterations repeated 
in Bi-KL-Partition and d is the average degree of vertices 
in G. Moreover, the heuristic algorithm of MCG takes 
0(K 2 \V\ 2 + K 2 \V\\E\) time. Then, the procedure GBi- 
KL-Partition costs i2uev B 0{c'\N{u)\ 2 ) = 0(Kd 2 ) time, 
where d is the number of iterations repeated in GBi-KL- 
Partition. Putting together, overall time complexity of this 
backbone discovery procedure is 0(|T^|gP + K 2 \V\\E\). 



However, discovered subset Vb with maximal total 
weights YjugVb are not necessar ily "good" backbone 

vertices for optimizing Lb(V). This is because we ne- 
glect the constraint that both (u — > v) and (v — > u) of 
undirected edge (u, v) should be both as backbone edges 
or non-backbone edges in vertex weight computation. In 
other words, if we apply procedure GBi-KL-Partition, then 
J2ugv b F ( u ) ma y decrease by using the updated p(e\B) and 
p(e\B). 

B. Backbone Discovery by Iterative Refinement 

To address aforementioned issue, we propose a refinement 
strategy to improve the backbone in an iterative fashion. The 
basic idea is to first discover a subgraph as search starting 
point by Algorithm |U then iteratively refine it by identifying 
a alternate backbone based on current one. Specially, in each 
iteration, we randomly abandon one vertex from current can- 
didate backbone and add a neighboring vertex with maximal 
value of F(u) (Formula |6]l to form a alternate backbone. If 
new backbone leads to better value of Lb{V), it would be 
used as current backbone for further refinement in the next 
iteration. 



Algorithm 5 IterativeRefinement(G = (V,£),K) 

Parameter: G is input network, K is the backbone size 
{Step 1: Preprocessing} 
1: invoke BackboneDiscovery{G,K) to obtain candidate backbone 

Gb = (Vb,B); 
2: for each u € Vb do 
3: invoke Bi-KL-Partition(u); 
4: compute F(u); 
5: end for 

6: Wh <- J2uev B F ( u )' 

1: Wl <— YsueVb F '( u )> {F'{u) is under the updated parameters 

of p(e\B) andp(e|B)} 
8: W^Wl; 

{Step 2: Iterative Refinement} 
9: while \V(G)\ > K A Wh > W do 

10: V s <— Vb\{v} and V <— V \ {v}; {randomly remove one 

vertex v from Vb and G} 
11: u<-argmax u gjV(v B )F(u); 
12: Vb <- Vb U {it}; 
13: W H <- W H - F(v) + F(u); 
14: B «- GBi-KL-Partition(V B ); 

15: W L = J2 u ev B F '( u )' 
16: if W < W L then 
17: W <- W L ; 

18: G B «- (Vb, B); {keep the best result} 
19: end if 

20: end while 

21: return Gb', 



The overall procedure to discover backbone by iterative 
refinement scheme is outlined in Algorithm [5] It consists 
of two key steps: preprocessing step to generate candidate 
backbone by procedure BackboneDiscovery, and refinement 
step for improving backbone by local search. In Step 1, we 
invoke procedure BackboneDiscovery to provide a subgraph 



Gb serving as starting point of refinement step (Line 1). For 
each vertex u in Gb, we perform procedure Bi-KL- Partition 
to help compute F(u) which is assigned as vertex weight 
(Line 2 to Line 5). The sum Wh of those vertex weights 
represents the upper bound of maximal likelihood achieved 
by Gb (Line 6). In the meanwhile, their true likelihoods 
are added together (i.e., Wl) to serve as lower bound of 
final backbone (Line 7). Moreover, W is used to denote 
the true maximal benefit achieved so far. In Step 2, we try 
to seek alternate backbone vertices in an iterative manner 
for improving likelihood. To speed up the search process, 
in each iteration, we randomly remove a vertex v from the 
current candidate backbone and graph G (eliminate its further 
consideration) (Line 11). Then, we add neighboring vertex u 
with maximal F{u) to form new backbone vertex set (Line 
12). We update upper bound Wh and lower bound Wl 
accordingly (Line 13 to Line 15). If a better backbone is 
found, current backbone will be kept as best result so far 
(Line 16 to Line 19). The refinement loop terminates until 
either the remaining graph is too small or the upper bound 
is smaller than the true likelihood achieved so far (Line 9). 
Note that since Step 2 involves a random search process, 
we can invoke it multiple times and choose the overall best 
backbone as our final result. 

Computational Complexity: The cost of step 1 is 
dominated by procedure BackboneDiscovery, which takes 
0(|T^|c? 2 + K 2 1 V 1 1 E |) times, where d is the average degree 
of vertices in G. For step 2, in each iteration, it takes 
0(dK) time to select best neighboring vertex to form a 
alternate backbone vertex set (Line 11), and takes 0(Kd 2 ) 
times to perform GBi-KL-Partition supposing the number 
of iteration required in the procedure is a small constant. 
Given this, assuming c iterations are needed in the refinement 
loop, step 2 takes 0(cKd 2 ) time in total. Overall, the 
time complexity of iterative refinement scheme to discover 
backbone is 0{\V\d 2 + K 2 \V\\E\). 

Finally, we note that in the above algorithm, we do not 
consider how to compute path set V and derive the basic 
probabilities, such as p(e) and p(e\e') in edge independence 
model and edge markovian model, respectively. In the worst 
case, assuming path set V includes the shortest path for each 
pair of vertices, the most straightforward way is to invoke 
| V] times BFS procedures in 0(\V\ x (\V\ + \E\)) time. 
When we enumerate these paths, we can online compute 
A^j (edge betweenness) and N e ' e , so there is no need to 
materialize the entire path set V . The overall computational 
time is 0(\V\ x (|V| + \E\)). 

VI. Empirical Study 

In this section, we evaluate the performance of proposed 
backbone discovery algorithms: 

1) the basic backbone discovery based on vertex betweenness 
(referred to it as VB); 

2) the backbone discovery approach based on maximum 
weight connected graph (referred to it as MCG); 



3) the backbone discovery approach based on iterative re- 
finement (referred to it as ITER). 

First, we study the performance of our methods from 
three aspects: modeling accuracy, parameter reduction and 
edge size in the backbone. Note that, modeling accuracy 
is measured by ratio between edge markovian model and 
bimodal markovian model, which is expressed as the loga- 
rithmic value of edge markovian model's (EM) likelihood 
divides the one of bimodal markovian model extracted by 
VB, MCG and ITER (denoted by EM/VB, EM/MCG 
and EM j ITER). The closer to 1 modeling accuracy is, 
the better results our methods achieve. Then, we study the 
efficiency of our methods on large random graphs with power 
law degree distribution. Finally, we perform case studies 
on co-author network and human protein-protein interaction 
(PPI) network to further demonstrate our approaches. 

We implemented all algorithms using C++ and the Stan- 
dard Template Library (STL). All experiments were con- 
ducted on a 2.0GHz Dual Core AMD Opteron CPU with 
4.0GB RAM running Linux. 

A. Real Datasets 

We study the bimodal markovian model on three real- 
world datasets, including one biological network and two 
co-author networks on different research fields: 
Yeast: The yeast protein-protein interaction network J4) 
includes 2361 vertices and 6646 edges. Each vertex indicates 
one protein and each edge denotes the interaction between 
two proteins. The network's average pairwise shortest dis- 
tance is 4.4. 

Net: The coauthorship network lfT31 of researchers who 
work in the field of network theory and experimentation, 
as collected by M. Newman. An edge joins two authors 
if and only if these two have collaborated on at least 
one paper in this area. Since the entire network consists 
of several disconnected components, we extract the largest 
connected component with 379 vertices and 1828 edges 
for the experiment. The network's average pairwise shortest 
distance is 6.1. 

DM: The co-author network in the field of data mining l2Tl 
which consists of 2000 researchers. Each of the 10615 edges 
indicates that the authors have co-authored at least one paper. 
The network's average pairwise shortest distance is 4.6. 

B. Performance of Bimodal Markovian Model 

In the following experiments, we investigate the perfor- 
mance of bimodal markovian model from several aspects. 
Preserved Modeling Accuracy: To verify the performance 
of bimodal markovian model on preserving modeling accu- 
racy of edge markovian model while reducing its number of 
parameters, we apply all approaches VB, MCG and ITER 
on above 3 datasets. The size of backbone is supposed to be 
small, thus we vary the number of vertices in backbone in 
the range from 5% to around 20% of the number of vertices 
in original networks. Especially, for Yeast, the number of 



vertices in the backbone varies from 100 to 600. The number 
of vertices in the backbone are set in the range from 20 to 
155 and the range from 100 to 400, respectively, for datasets 
Net and DM. We make the following observations: 

Figure [4(a)] Figure [4(b)| and Figure |4(cj| show that bimodal 
markovian model based on backbones discovered by both 
ITER and MCG greatly preserve the modeling accuracy of 
edge markovian model (refer to it as EM). Their modeling 
accuracies are consistently better than the one of backbones 
discovered by VB. In particular, for datasets Yeast and DM, 
the ratio between EM and bimodal markovian model based 
on backbones discovered by MCG and ITER are higher or 
very close to 90% in most of cases. Among two meth- 
ods, ITER utilizing iterative refinement strategy consistently 
achieves better results than MCG on all datasets. As the 
number of vertices in the backbone increases, the ratios of 
all methods are slowly decreased in Yeast and DM. More 
vertices are considered as backbone vertices, the simpler 



the model becomes (this is confirmed by Figure 4(g) and 
Figure |4(i)|i 



This directly leads to the coarser representation 
of paths and the decrease of bimodal markovian model's 
likelihood. Interestingly, unlike the consistently decreasing 
trend observed from EM/VB and EM/MCG, the results of 
ITER on Net do not consistently decrease with the increasing 
number of vertices in the backbones. This phenomena might 
be explained by two reasons: 1) our ITER method employing 
local search tries to achieve a local optimal solution while 
not global one; 2) larger backbone is possible to connect 
some important vertices which simplifies edge markovian 
model at the expense of less modeling accuracy. Therefore, 
it is reasonable to see the climbing trend from the data point 
corresponding to 35-vertex backbone to 50-vertex backbone. 
Backbone Complexity: We evaluate the backbone complex- 
ity based on the number of edges in the discovered backbone. 
From Figure |4(d)| Figure |4(e)| and Figure |4(f)| we can see 
that the backbones generated by MCG and ITER are rather 
sparse. Overall, the edge density (i.e., |£'|/|V r |) of backbones 
discovered by both MCG and ITER are very close to or small 
than 2.5 on all three datasets. The edge density of backbones 
in dataset Net is rather close to 1, which suggests that 
discovered backbone is tree-like structure. However, the edge 
density of backbone discovered by VB is much denser, which 
are around 4 and 3.5 in Yeast and DM. In addition, though 
ITER achieves better results than MCG regarding the number 
of parameters (Figure 4(g) Figure [4(h)| and Figure |4(i)) , the 
number of edges in the backbones generated by ITER is not 
guaranteed to be smaller than that of MCG (see Figure |4(e)| 
and Figure |4(f)| i. This is reasonable because the parameter 
reduction relies on the number of edges incident to backbone 
vertices while is independent of the number of backbone 
edges. In other words, for each backbone vertex v with 
immediate neighbors N(v), no matter how many incident 
edges are backbone edges, the number of parameters in 
bimodal markovian model is fixed to be 2 x |AT(i;)|. 
Parameter Reduction: For all three datasets, we compare 
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Fig. 4: Backbone discovery on real-world datasets 



the parameters reduction ratio between edge markovian 
model and bimodal markovian models based on backbones 



discovered by VB, ITER and MCG in Figure 4(g) 



Figure |4(h)| and Figure |4(i)| respectively. The parameter 
reduction ratio is computed by 

* Para Tpar*ZZ amBM where #P*ram EM and 
#ParamBM denote the number of parameters used 
in edge markovian model and bimodal markovian model, 
respectively. As we can see, all three approaches VB, ITER 
and MCG dramatically reduce the number of parameters 
in edge markovian model. Among them, it is interesting 
to see that VB outperforms both ITER and MCG in all 
settings. In VB, high-degree vertices tend to be selected as 
backbone vertices since they have high probability to lie in 
many shortest paths and have greater vertex betweenness. 
Therefore, more conditional probabilities of edges incident 
to these vertices would be simplified compared to other 
two methods. In datasets Yeast and DM, VB on average 
even reduces 73% and 66% parameters in EM model, 
respectively. Both MCG and ITER also achieve good 
parameter reduction ratio. In Yeast and DM, even though 
5% of vertices in original graphs are backbone vertices, 
around half of parameters in EM model are reduced while 
large portion of modeling accuracy is preserved. Also, more 
parameters are reduced by MCG than that of ITER in most 
of cases. As mentioned before, when the number of vertices 
in the backbones increases, more incident edges' conditional 
probabilities tend to be simplified. 

Finally, we note that in general, the right backbone size is 
application-dependent. Without any prior information, based 
on experimental results on those 3 datasets, it seems that us- 
ing around 10% of vertices in original networks as backbone 
vertices is a reasonable choice. There are significant losses 
of modeling accuracy for larger backbones and the number 
of parameter reduction is not high for smaller backbones. 
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Fig. 7: Running time for power-law graphs 



C. Performance Study 

To verify the scalability of our approach, we test a set of 
random undirected graphs with power-law degree distribu- 
tion. The graphs vary in size from 1QK to 1Q0K vertices and 



we set the edge density to be 4. We specified each backbone 
to have 100 vertices. 

We decompose the running time into two parts: prepro- 
cessing time (i.e., computing shortest paths and calculating 
edge or segment betweenness for basic probabilities) and 
backbone discovery time. Figure [7] shows the backbone 
discovery time of VB, ITER and MCG for random graphs 
with power-law degree distribution. These results clearly 
demonstrate the scalability of approaches MCG and ITER. In 
particular, the running time of ITER is very close to MCG, 
because the extra computational cost of ITER (Algorithm |5]l 
compared to MCG only depends on the size of backbone 
which is supposed to be small. This also confirms our time 
complexity analysis on ITER and MCG. Both are much faster 
than straightforward method VB by a factor of 59, due to the 
high cost of building minimal steiner tree in VB. The pre- 
processing step, especially computing the pairwise shortest 
distances, as expected is more expensive. The preprocessing 
time of all methods varies from 30 seconds to 121 minutes. 
We note that sampling seems to be an effective approach to 
avoid the full pairwise computation, thus speeding up the 
preprocessing time. It is beyond the scope of this paper and 
will be investigated in future work. 

VII. Case Studies 

In this section, we report network backbones in co-author 
networks and the PPI network discovered by ITER method. 
Co-author Networks (Net and DM): Figure |4(j)| Fig- 



ure |4(k)| and Figure |4(1)| show the discovered backbones 
with the number of backbone vertices varying from 25 to 
50. These figures precisely depict the backbone generation 
and growth process. In Figure pHj)] the backbone is a sparse 
subgraph with only 25 vertices. Comparing this backbone 
to the full connected component, we see that these ver- 
tices serve as the essential connectors among several "small 
world" components. As the backbone expands to 35 vertices 
in Figure |4(k)| most of the earlier vertices are retained, 
while several important researchers, such as J. Kleinberg 
and P, Holme are added. Then, the backbone is slightly 
expanded from Figure [4(k)| to Figure [4(1)1 Another interesting 
observation is that some of the researchers in the backbone 
are not necessarily the best-known scientists nor do they have 
a high number of collaborators. For instance, C. Edling in 
the backbone only has 5 collaborators in this network. In 
contrast to the traditional research which aims to discover 
highly correlated components, our backbone model studies 
complex networks from a new angle. The discovered back- 
bone essentially captures the communication path among 
different highly correlated communities. 

From Figure [4(m)l we can see that many of the discovered 
researchers, like Jiawei Han, Rakesh Agrawal and Christos 
Faloutsos are prominent scientists in data mining. Compared 

'A figure of the largest connected component can be found at http://www- 
personal.umich.edu/~mejn/centrality/ 



Fig. 5: Visualization of the 200 gene backbone for the PPI network. Red color indicates genes involving in at least 4 KEGG 
pathways. The larger the nodes, the more KEGG pathways they are involved in. 
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Fig. 6: Function and pathway enrichment analysis on the 200 backbone genes by IPA. Left: Top 15 enriched functional 
categories. Middle: Top 15 enriched canonical pathways. Right: Top 15 diseases and disorders. 



to relatively sparse backbones on Net, the backbone from 
the data mining co-author network is denser. This indicates 
the different collaboration styles in different research fields. 
In the field of network theory and experiment, researchers 
tend to collaborate within small groups while a few of them 
have connections among different groups. However, in the 
data mining area, many scientists work in several different 
directions which results in more wide-ranging collaborations 
among them. 

Human PPI Network: We applied the backbone discovery 
algorithm (ITER) on the human protein-protein interaction 
(PPI) dataset obtained from l20l to identify backbone 
of the human PPI. This dataset consists of 3133 genes 
and 12298 edges indicating relationships among them. Our 
algorithm returned the genes in the backbone with the user 
specified size (which is 200 in our test). As shown in 
Figure [5] the backbone genes contain many well known and 
important genes in cellular signaling transduction pathways 
including both kinases (e.g., RAF1, MAPK14, SRC and 
FYN) and receptors (e.g., TRAF6, PDGFRB) as well as 
signaling molecules such as PDGFB. Unlike traditional gene 
set discovery studies for which we expect to obtain a group 
of genes with a small set of specifically enriched functions 
or pathways, we expect that the backbones genes of the 
PPI network would be engaged in many different functions 
and possibly pathways. The functional and pathway analysis 
using tools such as the Ingenuity Pathway Analysis (IPA) 
indeed confirmed our expectation. As shown in Figure |6l 
the 200 genes are highly enriched with a wide spectrum 



of important biological functions and are related to many 
different diseases with high statistical significance. Moreover, 
they are involved in a large number of pathways, which is 
very rare for a gene list of this size. For the IPA canonical 
pathways, the 200 genes show enrichment with p-values 
(of the Fisher's exact test used by IPA) less than 0.0001 
for more than 70 different pathways. These observations 
suggest that many backbone genes may involve in more 
than one pathways. Indeed we found that out of the 200 
genes (of which 195 can be mapped to KEGG gene ids) 
47 are involved in at least four KEGG pathways. This is 
a highly significant enrichment comparing to the fact that 
a total 1,100 such genes can be found among the entire 
genome of 19,076 annotated human genes in the KEGG 
database (p < 1.9 x 10 for hypergeometric test). It can 
be conceived that perturbation on these genes can lead to 
serious disruption of important biological functions, which 
implies the involvement in diseases in human. This is also 
confirmed as shown in Figure [6] Therefore, our experimental 
study on the PPI network backbone discovery demonstrated 
the effectiveness of our approach and its great potential as a 
new gene ranking tool. 

VIII. Conclusion 

In this paper, we introduce a new backbone discovery 
problem and propose novel discovering approaches based 
on vertex betweenness and A'L-divergence. We believe the 
backbone approach opened a new way to study complex 
networks and systems, and also presents many new research 



questions for both data mining and complex network re- 
search: How do network backbone and modularity coexists 
and how they affect each other? How robust is the backbone, 
and how will it change? What information is carried in the 
backbone? We plan to work on these fascinating questions 
in the future. 
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